ESS330 Daily Assignment 21

Lecture 21: Introduction to Time Series Data in ‘R’

Author

Neva Morgan

Published

April 21, 2025

Objective:

In this activity, you will download streamflow data from the Cache la Poudre River (USGS site 06752260) and analyze it using a few time series methods.

Setting Up:

library(zoo)
Warning: package 'zoo' was built under R version 4.4.3

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
library(timeSeries) 
Warning: package 'timeSeries' was built under R version 4.4.3
Loading required package: timeDate

Attaching package: 'timeSeries'
The following object is masked from 'package:zoo':

    time<-
The following objects are masked from 'package:graphics':

    lines, points
# For some reason the ts package wouldn't download due to it being out of date or either my RStudio is out of date?
library(xts)
Warning: package 'xts' was built under R version 4.4.3
library(tidyverse)
Warning: package 'tidyverse' was built under R version 4.4.3
Warning: package 'ggplot2' was built under R version 4.4.3
Warning: package 'tidyr' was built under R version 4.4.3
Warning: package 'readr' was built under R version 4.4.3
Warning: package 'purrr' was built under R version 4.4.3
Warning: package 'dplyr' was built under R version 4.4.3
Warning: package 'lubridate' was built under R version 4.4.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks timeSeries::filter(), stats::filter()
✖ dplyr::first()  masks xts::first()
✖ dplyr::lag()    masks timeSeries::lag(), stats::lag()
✖ dplyr::last()   masks xts::last()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(tidymodels)
Warning: package 'tidymodels' was built under R version 4.4.3
── Attaching packages ────────────────────────────────────── tidymodels 1.3.0 ──
✔ broom        1.0.8     ✔ rsample      1.3.0
✔ dials        1.4.0     ✔ tune         1.3.0
✔ infer        1.0.8     ✔ workflows    1.2.0
✔ modeldata    1.4.0     ✔ workflowsets 1.1.0
✔ parsnip      1.3.1     ✔ yardstick    1.3.2
✔ recipes      1.3.0     
Warning: package 'broom' was built under R version 4.4.3
Warning: package 'dials' was built under R version 4.4.3
Warning: package 'infer' was built under R version 4.4.3
Warning: package 'parsnip' was built under R version 4.4.3
Warning: package 'recipes' was built under R version 4.4.3
Warning: package 'rsample' was built under R version 4.4.3
Warning: package 'tune' was built under R version 4.4.3
Warning: package 'workflows' was built under R version 4.4.3
Warning: package 'yardstick' was built under R version 4.4.3
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks timeSeries::filter(), stats::filter()
✖ dplyr::first()    masks xts::first()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks timeSeries::lag(), stats::lag()
✖ dplyr::last()     masks xts::last()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
library(ggplot2)
library(tsibble)
Warning: package 'tsibble' was built under R version 4.4.3
Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr

Attaching package: 'tsibble'

The following object is masked from 'package:lubridate':

    interval

The following object is masked from 'package:zoo':

    index

The following objects are masked from 'package:base':

    intersect, setdiff, union
library(feasts)
Warning: package 'feasts' was built under R version 4.4.3
Loading required package: fabletools
Warning: package 'fabletools' was built under R version 4.4.3

Attaching package: 'fabletools'

The following object is masked from 'package:yardstick':

    accuracy

The following object is masked from 'package:parsnip':

    null_model

The following objects are masked from 'package:infer':

    generate, hypothesize
library(dplyr)

First, use this code to download the data from the USGS site.

library(dataRetrieval)
Warning: package 'dataRetrieval' was built under R version 4.4.3
# Example: Cache la Poudre River at Mouth (USGS site 06752260)
poudre_flow <- readNWISdv(siteNumber = "06752260",   # Download data from USGS for site 06752260
                          parameterCd = "00060",     # Parameter code 00060 = discharge in cfs)
                          startDate = "2013-01-01",  # Set the start date
                          endDate = "2023-12-31") |> # Set the end date
  renameNWISColumns() |> # Rename columns to standard names (e.g., "Flow","Date")
  mutate(Date = yearmonth(Date)) |> # Convert daily Date values into a year-month format (e.g., "2023 Jan")
  group_by(Date) |> # Group the data by the new monthly Date
  summarise(Flow = mean(Flow)) # Calculate the average daily flow for each month
GET:https://waterservices.usgs.gov/nwis/dv/?site=06752260&format=waterml%2C1.1&ParameterCd=00060&StatCd=00003&startDT=2013-01-01&endDT=2023-12-31

Assignment:

1. Convert to tsibble

Use as_tsibble() to convert the data.frame into a tsibble object. This will allow you to use the feast functions for time series analysis.

pf_tbl <- as_tsibble(poudre_flow)
Using `Date` as index variable.
head(pf_tbl)
# A tsibble: 6 x 2 [1M]
      Date   Flow
     <mth>  <dbl>
1 2013 Jan  18.1 
2 2013 Feb  18.0 
3 2013 Mar   8.21
4 2013 Apr   5.94
5 2013 May 333.  
6 2013 Jun 300.  

2. Plotting the time series

Use ggplot to plot the time series data. Animate this plot with plotly

#Setting up for Plotting

library(plotly)
Warning: package 'plotly' was built under R version 4.4.3

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:timeSeries':

    filter
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
pf_plot <- pf_tbl |>
  autoplot() +
  geom_line() +
  labs(title = "Interactive Poudre Flow Time Series",
       x = "Date",
       y = "Flow",
       subtitle = "ESS330 A-21 | Neva Morgan")
Plot variable not specified, automatically selected `.vars = Flow`
ggplotly(pf_plot)

3. Subseries

Describe what you see in the plot. How are “seasons” defined in this plot? What do you think the “subseries” represent?

After plotting using the gg_subseries, the monthly flow rate for the Poudre River, appears to be at a higher level during May and June months, with an occasional increase of flow during April or September.

Seasons within this plot are defined by the months that are correlated with similar flow rate measurements to one another, the larger increae of flow could represent the end of spring moving into summer months (April - September).

From what we’ve learned from class, “subseries” are represented by the different years within the months of the data, showing how flow has changed from each month with multiple years being compared to one another.

4. Decompose

Use the model(STL(…)) pattern to decompose the time series data into its components: trend, seasonality, and residuals. Chose a window that you feel is most appropriate to this data…

# Making the Decomposition model
pf_decomp <- pf_tbl |>
  model(STL(Flow ~ season(window = "periodic"))) |>
  components()

# Visualizing with autoplot
autoplot(pf_decomp) +
  labs(title = "STL Decomposition of Poudre River Flow", 
       y = "Flow") +
  theme_minimal()

ggpubr::ggdensity(pf_decomp$remainder, main = "Residual Component")

shapiro.test(pf_decomp$remainder)

    Shapiro-Wilk normality test

data:  pf_decomp$remainder
W = 0.84083, p-value = 1.25e-10

Describe what you see in the plot. How do the components change over time? What do you think the trend and seasonal components represent?

After running a few extra tests to understand how the flow of the Poudre River has changed from 2013 to 2023, The window that showed the most alarming change for me at least was the Residual window. From what I can understand of the data presented, the flow has a pretty annual transition, peaking around May and June, and having a stagnant section in the months between. The flow has changed with time, it’s highest recorded in the later months of 2015, indicating a heavier rainfall or water accumulation in the river for that year. But it also shows dips in the data where the flow was in a negative rate, these could indicate a drier or dorughted year in terms of precipiation.

From what I understand from class and my other courses, the trend component is measuring the average of flow as it spans over the years, showing fluctuation from years prior as compared to the decrease we have now. The seasonal component shows the consistent peak that occurs from March to May as water is changing in it’s physical state and moving down through the watershed into the Poudre River.

Submission:

Upload a rendered qmd file to the course website. Make sure to include your code and any plots you created.

This should be an HTML file with self-contained: true.

It should not point to a local host, and must be the physical file.

Make sure to include your code and any plots you created and that the outputs render as you expect.